Estimating defectiveness of source code: A predictive model using GitHub content
نویسندگان
چکیده
Two key contributions presented in this paper are: i) A method for building a dataset containing source code features extracted from source files taken from Open Source Software (OSS) and associated bug reports, ii) A predictive model for estimating defectiveness of a given source code. These artifacts can be useful for building tools and techniques pertaining to several automated software engineering areas such as bug localization, code review, and recommendation and program repair. In order to achieve our goal, we first extract coding style information (e.g. related to programming language constructs used in the source code) for source code files present on GitHub. Then the information available in bug reports (if any) associated with these source code files are extracted. Thus fetched un(/ semi)-structured information is then transformed into a structured knowledge base. We considered more than 30400 source code files from 20 different GitHub repositories with about 14950 associated bug reports across 4 bug tracking portals. The source code files considered are written in four programming languages (viz., C, C++, Java, and Python) and belong to different types of applications. A machine learning (ML) model for estimating the defectiveness of a given input source code is then trained using the knowledge base. In order to pick the best ML model, we evaluated 8 different ML algorithms such as Random Forest, K Nearest Neighbor and SVM with around 50 parameter configurations to compare their performance on our tasks. One of our findings shows that best K-fold (with k=5) cross-validation results are obtained with the NuSVM technique that gives a mean F1 score of 0.914.
منابع مشابه
Using StackOverflow content to assist in code review
An important goal for programmers is to minimize cost of identifying and correcting defects in source code. Code review is commonly used for identifying programming defects. However, manual code review has some shortcomings: a) it is time consuming, b) outcomes are subjective and depend on the skills of reviewers. An automated approach for assisting in code reviews is thus highly desirable. We ...
متن کاملSummarizing Git Commits and GitHub Pull Requests Using Sequence to Sequence Neural Attention Models
Every day millions of developers and programmers push commits to GitHub to ensure their projects are version controlled, reproducible, and remotely accessible. There are nearly 20 million public repositories (collections of source code in the form of projects) on GitHub today, and over 16 million unique users. Users are able to commit additions or changes to their own repositories, as well as t...
متن کاملCorrection: A Fast Incremental Gaussian Mixture Model
The Data Availability Statement for this paper is incorrect. The correct Data Availability Statement is: Data are available at Figshare (http://figshare.com/articles/A_Fast_Incremental_ Gaussian_Mixture_Model/1552030). The MNIST data set is available at (http://yann.lecun. com/exdb/mnist/) and the CIFAR10 data set is available at (http://www.cs.toronto.edu/~kriz/ cifar.html). The software binar...
متن کاملEstimating Web Service Quality of Service Parameters using Source Code Metrics and LSSVM
We conduct an empirical analysis to investigate the relationship between thirty seven different source code metrics with fifteen different Web Service QoS (Quality of Service) parameters. The source code metrics used in our experiments consists of nineteen Object-Oriented metrics, six Baski and Misra metrics, and twelve Harry M. Sneed metrics. We apply Principal Component Analysis (PCA) and Rou...
متن کاملEstimation of dosimetric parameters of I-125 brachytherapy source model 6711 using GATE8.1 code
Brachytherapy is one type of internal radiation therapy where radiation sources, which are usually encapsulated are placed as close as possible to the tumor site inside the patient's body. In this technique, it is important to determine dose distribution around the brachytherapy capsule. Hereby, in this paper, dosimetric parameters of I-125 brachytherapy source model 6711 are estimated accordin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018